For this assignment, I discussed theory and method of approaches with Jonas Kempf.
In this assignment, you will experiment with gradient descent and Newton’s method. Complete the partial python code is included in the zip file. Areas that you should fill in are marked with “TODO” comments.
You should turn in an archive containing all of your source code, and include all plots and answers to underlined questions in your problem set writeup.
All algorithms are implemented in “algorithms.py”. An “exact” line search algorithm (using the bisection method) is fully implemented in “algorithms.py”. You should understand the implementation of the algorithm. Feel free to reuse code from your previous assignment. In algorithms.py you need to fill in the following lines marked TODO:
import time
import numpy as np
def bisection( func, x, direction, eps=1e-9, maximum_iterations=65536 ):
"""
'Exact' linesearch (using bisection method)
func: the function to optimize It is called as "value, gradient = func( x, 1 )
x: the current iterate
direction: the direction along which to perform the linesearch
eps: the maximum allowed error in the resulting stepsize t
maximum_iterations: the maximum allowed number of iterations
"""
x = np.asarray( x )
direction = np.asarray( direction )
if eps <= 0:
raise ValueError("Epsilon must be positive")
_, gradient = func( x , 1 )
gradient = np.asarray( gradient )
# checking that the given direction is indeed a descent direciton
if np.vdot( direction, gradient ) >= 0:
return 0
else:
# setting an upper bound on the optimum.
MIN_t = 0
MAX_t = 1
iterations = 0
value, gradient = func( x + MAX_t * direction, 1 )
value = np.double( value )
gradient = np.asarray( gradient )
while np.vdot( direction, gradient ) < 0:
MAX_t *= 2
value, gradient = func( x + MAX_t * direction, 1 )
iterations += 1
if iterations >= maximum_iterations:
raise ValueError("Too many iterations")
# bisection search in the interval (MIN_t, MAX_t)
iterations = 0
while True:
t = ( MAX_t + MIN_t ) / 2
value, gradient = func( x + t * direction, 1 )
value = np.double( value )
gradient = np.asarray( gradient )
suboptimality = abs( np.vdot( direction, gradient ) ) * (MAX_t - t)
if suboptimality <= eps:
break
if np.vdot( direction, gradient ) < 0:
MIN_t = t
else:
MAX_t = t
iterations += 1
if iterations >= maximum_iterations:
raise ValueError("Too many iterations")
return t
###############################################################################
def newton( func, initial_x, eps=1e-5, maximum_iterations=65536, linesearch=bisection, *linesearch_args ):
"""
Newton's Method
func: the function to optimize It is called as "value, gradient, hessian = func( x, 2 )
initial_x: the starting point
eps: the maximum allowed error in the resulting stepsize t
maximum_iterations: the maximum allowed number of iterations
linesearch: the linesearch routine
*linesearch_args: the extra arguments of linesearch routine
"""
if eps <= 0:
raise ValueError("Epsilon must be positive")
x = np.asarray( initial_x.copy() )
# initialization
values = []
runtimes = []
xs = []
start_time = time.time()
iterations = 0
# newton's method updates
while True:
value, gradient, hessian = func( x , 2 )
value = np.double( value )
gradient = np.matrix( gradient )
hessian = np.matrix( hessian )
# updating the logs
values.append( value )
runtimes.append( time.time() - start_time )
xs.append( x.copy() )
### TODO: Compute the Newton update direction
direction = -np.linalg.inv(hessian) @ gradient
### TODO: Compute the Newton decrement
newton_decrement = np.sqrt(gradient.T @ np.linalg.inv(hessian) @ gradient)
#if (### TODO: stop criterion):
# break
if newton_decrement <= eps:
break
t = linesearch( func, x, direction )
x += t * np.asarray( direction )
iterations += 1
if iterations >= maximum_iterations:
raise ValueError("Too many iterations")
return (x, values, runtimes, xs)
The python script main quadratic.py creates a contour plot for the following objective (con- tained in hw6–functions.py) :
on top of that.
######
###### This script compares different optimization methods on optimizating
###### a quadratic function of the form f(x) = 1/2 * x'Hx + x'b
######
%matplotlib inline
import numpy as np
import hw6_functions as hw
import algorithms as alg
import plotly.graph_objects as go
import plotly.io as pio
# This ensures Plotly output works in multiple places:
# plotly_mimetype: VS Code notebook UI
# notebook: "Jupyter: Export to HTML" command in VS Code
# See https://plotly.com/python/renderers/#multiple-renderers
pio.renderers.default = "plotly_mimetype+notebook"
# the hessian matrix of the quadratic function
H = np.matrix('1 0; 0 30')
# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')
# Choose the quadratic objective. This notation defines a "closure", returning
# an oracle function which takes x (and order) as its only parameter, and calls
# obj.quadratic with parameters H and b defined above, and the parameters of
# the closure (x and order)
func = lambda x, order: hw.quadratic( H, b, x, order )
# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536
# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )
x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )
#Z = hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
#fig = go.Figure(data =
# go.Contour(
# z = Z,
# contours_coloring = 'lines',
# line_width = 2,
# ))
#fig.show()
hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
(b) How do you explain the number of iterations in Newton’s method?
Since Newton's method takes both information from the Gradient and the Hessian, we are able to converge in only one iteration to the minimum. With the gradient, since we have less information about our function, we need 55 iterations to converge.
(c) Play around with different Hessian matrices, and explore the impact of various choices on
the performance of gradient descent. Describe your results.
# the hessian matrix of the quadratic function
H = np.matrix('100 0; 0 300')
# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')
func = lambda x, order: hw.quadratic( H, b, x, order )
# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536
# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )
x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )
hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
# the hessian matrix of the quadratic function
H = np.matrix('8 0; 0 5')
# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')
func = lambda x, order: hw.quadratic( H, b, x, order )
# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536
# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )
x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )
hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
With Hessians that have diagonals more similar in value, the gradient descent and Newton method have similar iterations. However, the larger scale of the Hessian, Newtons method performs consistently quick while GD takes slower.
The python script main–sum–exp.py creates a contour plot for the following objective (con- tained in hw6–functions.py): $$ f (x) = \sum_i \exp (a^T_i x + b_i) $$ for some $a_i$ and $b_i$’s.
on top of that.
######
###### This script compares different optimization methods on optimizating
###### a sum-exp Objective of the form sum_i exp( < a_i , x > + b_i )
######
%matplotlib inline
import numpy as np
import hw6_functions as hw
import algorithms as alg
# a is the vector of coefficients
a = np.matrix('1 3; 1 -3; -1 0')
# b is the vector of bias terms
b = np.matrix('-0.1; -0.1; -0.1')
# Choose the sum-exp objective. This notation defines a "closure", returning
# an oracle function which takes x (and order) as its only parameter, and calls
# obj.sum_exp with parameters a and b defined above, and the parameters of
# the closure (x and order)
func = lambda x, order: hw.sum_exp( a, b, x, order )
# Use backtracking line search. This defines a closure which takes three
# parameters. Note that the alpha and beta parameters to the backtracking line
# search are 0.4 and 0.9, respectively
linesearch = lambda f, x, direction: alg.backtracking( f, x, direction, 0.4, 0.9 )
# Start at (-1,1.5), with (for BFGS) an identity inverse Hessian
initial_x = np.matrix('-1; 1.5')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536
# Setting the backtracking line search parameters
alpha = 0.4
beta = 0.9
# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent( func, initial_x, eps, maximum_iterations, alg.backtracking, alpha, beta )
x, values, runtimes, newton_xs = alg.newton( func, initial_x, eps, maximum_iterations, alg.backtracking, alpha, beta )
# Draw contour plots
hw.draw_contour( func, gd_xs, newton_xs, levels=np.arange(5, 1000, 10), x=np.arange(-3, 1.04, 0.04), y=np.arange(-2, 2.04, 0.04) )
Chapter 4: Problem 2, and 3. You can code these problems in Matlab or Python. No need to do part (c) of Problem 2
import numpy as np
# Parameters
# Textbook calls m "mu"
mu = 0.01
L = 1
kappa = L / mu
n = 100
# Generating random matrix A with specific eigenvalues
A = np.random.randn(n, n)
Q, R = np.linalg.qr(A) # QR decomposition
D = np.random.rand(n, 1)
D = 10 ** D
Dmin = np.min(D)
Dmax = np.max(D)
D = (D - Dmin) / (Dmax - Dmin)
D = mu + D * (L - mu)
A = Q.T @ np.diag(D.ravel()) @ Q
# Tolerance and maximum iterations
epsilon = 1e-6
kmax = 1000
def steepest_descent(A, x0, alpha = 0.01, epsilon=1e-6, kmax=1000, linesearch = False):
# This function can do custom alpha or exact line search
# If exact linesearch is desired, alpha is arbitrary
x = x0.copy()
f_values = []
for k in range(kmax):
gradient = A @ x
if linesearch == True:
alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)
# Update step
x = x - alpha * gradient
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
def steepest_descent_exact_line_search(A, x0, epsilon=1e-6, kmax=1000):
x = x0.copy()
for k in range(kmax):
gradient = A @ x
alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)
# Update x # Compute the optimal step size for the current gradient
x = x - alpha * gradient
if np.linalg.norm(gradient) < epsilon:
break
return k # Returning the number of iterations
def heavy_ball_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
x = x0.copy()
f_values = []
x_prev = np.zeros_like(x0) # Initialize x_{k-1}
for k in range(kmax):
gradient = A @ x
# Heavy-ball update
x_next = x - alpha * gradient + beta * (x - x_prev)
x_prev = x
x = x_next
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
def nesterovs_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
x = x0.copy()
#x_prev = np.zeros_like(x0) # Initialize x_{k-1}
y = x.copy()
f_values = []
for k in range(kmax):
gradient = A @ y
# Nesterov's update
#y = x_prev - (1/beta)*gradient
x_next = y - alpha * gradient
y = x_next + beta * (x_next - x)
#x_prev = x
x = x_next
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
num_trials = 10
total_iterations = []
# Perform optimization for each trial
# This is step size 2/(m + L)
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_1, f_values_1, iterations_1 = steepest_descent(A, x0, 2/(m + L), linesearch= False)
total_iterations.append(iterations_1)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 2/(m + L):", average_iterations)
#print("My x_1 is ", optimal_x_1)
#print(f_values_1)
# Step size 1/L
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_2, f_values_2, iterations_2 = steepest_descent(A, x0, 1/L)
total_iterations.append(iterations_2)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 1/L:", average_iterations)
# Now with Linesearch
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_3, f_values_3, iterations = steepest_descent(A, x0, linesearch=True)
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for exact linesearch:", average_iterations)
# Now we roll with the heavyball
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_4, f_values_4, iterations = heavy_ball_method(A, x0, alpha = 4 / (np.sqrt(L) + np.sqrt(m))**2, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Heavy-Ball Method:", average_iterations)
# And finally, Nesterovs
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_5, f_values_5, iterations = nesterovs_method(A, x0, alpha = 1/L, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Nesterov's Optimal Method:", average_iterations)
# Check for positive definite
#np.all(np.linalg.eigvals(A) > 0)
Average number of iterations over 10 trials for step size 2/(m + L): 666.8 Average number of iterations over 10 trials for step size 1/L: 783.2 Average number of iterations over 10 trials for exact linesearch: 674.4 Average number of iterations over 10 trials for Heavy-Ball Method: 543.475 Average number of iterations over 10 trials for Nesterov's Optimal Method: 456.56
Draw a plot of the convergence behavior on a typical run, plotting iteration number against $log_{10}(f (xk) - f (x∗))$. (Use the same figure, with different colors for each algorithm.)
import plotly.express as px
# Perform steepdest descent
#optimal_x, f_values, arb = steepest_descent(A, x0, 2/(m + L))
# Compute optimal function value for comparison
f_star_1 = 1/2 * optimal_x_1.T @ A @ optimal_x_1 # func(A, optimal_x)
f_star_1 = f_star_1.item() # f-star is double nested matrix
f_star_2 = 1/2 * optimal_x_2.T @ A @ optimal_x_2
f_star_2 = f_star_2.item()
# Exact Linesearch
f_star_3 = 1/2 * optimal_x_3.T @ A @ optimal_x_3
f_star_3 = f_star_3.item()
# Heavy Ball
f_star_4 = 1/2 * optimal_x_4.T @ A @ optimal_x_4
f_star_4 = f_star_4.item()
# Nesterov
f_star_5 = 1/2 * optimal_x_5.T @ A @ optimal_x_5
f_star_5 = f_star_5.item()
# Adding a small positive value to prevent dividing by zero error
log_diff_1 = [np.log10(max(val - f_star_1, 1e-15)) for val in f_values_1]
log_diff_2 = [np.log10(max(val - f_star_2, 1e-15)) for val in f_values_2]
log_diff_3 = [np.log10(max(val - f_star_3, 1e-15)) for val in f_values_3]
log_diff_4 = [np.log10(max(val - f_star_4, 1e-15)) for val in f_values_4]
log_diff_5 = [np.log10(max(val - f_star_5, 1e-15)) for val in f_values_5]
print(f_star_5)
# Plot Hack
first_label = np.repeat("alpha = 2/(m + L)", len(f_values_1))
# Plotting with Plotly
fig = px.line(x=list(range(len(f_values_1))), y=log_diff_1, color = first_label, labels={'x': 'Iteration', 'y': 'log10(f(x_k) - f(x*))', "color": "Algorithm"},
title='Convergence of Different Methods')
fig.add_trace(go.Line(x=list(range(len(f_values_2))), y=log_diff_2, name= "alpha = 1/L"))
fig.add_trace(go.Line(x=list(range(len(f_values_3))), y=log_diff_3, name= "Exact Linesearch"))
fig.add_trace(go.Line(x=list(range(len(f_values_4))), y=log_diff_4, name= "Heavy Ball Method"))
fig.add_trace(go.Line(x=list(range(len(f_values_5))), y=log_diff_5, name= "Nesterov's Method"))
fig.show()
4.035820934655251e-11
c:\Users\Gianni\anaconda3\lib\site-packages\plotly\graph_objs\_deprecations.py:378: DeprecationWarning: plotly.graph_objs.Line is deprecated. Please replace it with one of the following more specific types - plotly.graph_objs.scatter.Line - plotly.graph_objs.layout.shape.Line - etc.
We can see from the above plot the Nesterov's Accelerated Method performs the best, followed by the Heavy Ball method. The slowest is the steepest descent with step size $\alpha = \frac{1}{L}$.
import numpy as np
# Parameters
# Textbook calls m "mu"
mu = 0
L = 1
n = 100
# Generating random matrix A with specific eigenvalues
A = np.random.randn(n, n)
Q, R = np.linalg.qr(A) # QR decomposition
D = np.random.rand(n, 1)
D = 10 ** D
Dmin = np.min(D)
Dmax = np.max(D)
D = (D - Dmin) / (Dmax - Dmin)
D = mu + D * (L - mu)
A = Q.T @ np.diag(D.ravel()) @ Q
# Tolerance and maximum iterations
epsilon = 1e-6
kmax = 1000
def steepest_descent(A, x0, alpha = 0.01, epsilon=1e-6, kmax=1000, linesearch = False):
# This function can do custom alpha or exact line search
# If exact linesearch is desired, alpha is arbitrary
x = x0.copy()
f_values = []
for k in range(kmax):
gradient = A @ x
if linesearch == True:
alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)
# Update step
x = x - alpha * gradient
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
def steepest_descent_exact_line_search(A, x0, epsilon=1e-6, kmax=1000):
x = x0.copy()
for k in range(kmax):
gradient = A @ x
alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)
# Update x # Compute the optimal step size for the current gradient
x = x - alpha * gradient
if np.linalg.norm(gradient) < epsilon:
break
return k # Returning the number of iterations
def heavy_ball_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
x = x0.copy()
f_values = []
x_prev = np.zeros_like(x0) # Initialize x_{k-1}
for k in range(kmax):
gradient = A @ x
# Heavy-ball update
x_next = x - alpha * gradient + beta * (x - x_prev)
x_prev = x
x = x_next
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
def nesterovs_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
x = x0.copy()
#x_prev = np.zeros_like(x0) # Initialize x_{k-1}
y = x.copy()
f_values = []
for k in range(kmax):
gradient = A @ y
# Nesterov's update
#y = x_prev - (1/beta)*gradient
x_next = y - alpha * gradient
y = x_next + beta * (x_next - x)
#x_prev = x
x = x_next
f_value = 0.5 * x.T @ A @ x
f_values.append(f_value.item())
if np.linalg.norm(gradient) < epsilon:
break
return x, f_values, k
num_trials = 10
total_iterations = []
# Perform optimization for each trial
# This is step size 2/(m + L)
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_1, f_values_1, iterations_1 = steepest_descent(A, x0, 2/(m + L), linesearch= False)
total_iterations.append(iterations_1)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 2/(m + L):", average_iterations)
#print("My x_1 is ", optimal_x_1)
#print(f_values_1)
# Step size 1/L
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_2, f_values_2, iterations_2 = steepest_descent(A, x0, 1/L)
total_iterations.append(iterations_2)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 1/L:", average_iterations)
# Now with Linesearch
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_3, f_values_3, iterations = steepest_descent(A, x0, linesearch=True)
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for exact linesearch:", average_iterations)
# Now we roll with the heavyball
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_4, f_values_4, iterations = heavy_ball_method(A, x0, alpha = 4 / (np.sqrt(L) + np.sqrt(m))**2, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Heavy-Ball Method:", average_iterations)
# And finally, Nesterovs
for trial in range(num_trials):
# Initial point for each trial
x0 = np.random.randn(n, 1)
optimal_x_5, f_values_5, iterations = nesterovs_method(A, x0, alpha = 1/L, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
total_iterations.append(iterations)
# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Nesterov's Optimal Method:", average_iterations)
# Check for positive definite
#np.all(np.linalg.eigvals(A) > 0)
Average number of iterations over 10 trials for step size 2/(m + L): 999.0 Average number of iterations over 10 trials for step size 1/L: 999.0 Average number of iterations over 10 trials for exact linesearch: 999.0 Average number of iterations over 10 trials for Heavy-Ball Method: 801.375 Average number of iterations over 10 trials for Nesterov's Optimal Method: 772.08
import plotly.express as px
# Perform steepdest descent
#optimal_x, f_values, arb = steepest_descent(A, x0, 2/(m + L))
# Compute optimal function value for comparison
f_star_1 = 1/2 * optimal_x_1.T @ A @ optimal_x_1 # func(A, optimal_x)
f_star_1 = f_star_1.item() # f-star is double nested matrix
f_star_2 = 1/2 * optimal_x_2.T @ A @ optimal_x_2
f_star_2 = f_star_2.item()
# Exact Linesearch
f_star_3 = 1/2 * optimal_x_3.T @ A @ optimal_x_3
f_star_3 = f_star_3.item()
# Heavy Ball
f_star_4 = 1/2 * optimal_x_4.T @ A @ optimal_x_4
f_star_4 = f_star_4.item()
# Nesterov
f_star_5 = 1/2 * optimal_x_5.T @ A @ optimal_x_5
f_star_5 = f_star_5.item()
# Adding a small positive value to prevent dividing by zero error
log_diff_1 = [np.log10(max(val - f_star_1, 1e-15)) for val in f_values_1]
log_diff_2 = [np.log10(max(val - f_star_2, 1e-15)) for val in f_values_2]
log_diff_3 = [np.log10(max(val - f_star_3, 1e-15)) for val in f_values_3]
log_diff_4 = [np.log10(max(val - f_star_4, 1e-15)) for val in f_values_4]
log_diff_5 = [np.log10(max(val - f_star_5, 1e-15)) for val in f_values_5]
print(f_star_5)
# Plot Hack
first_label = np.repeat("alpha = 2/(m + L)", len(f_values_1))
# Plotting with Plotly
fig = px.line(x=list(range(len(f_values_1))), y=log_diff_1, color = first_label, labels={'x': 'Iteration', 'y': 'log10(f(x_k) - f(x*))', "color": "Algorithm"},
title='Convergence of Different Methods with m = 0')
fig.add_trace(go.Line(x=list(range(len(f_values_2))), y=log_diff_2, name= "alpha = 1/L"))
fig.add_trace(go.Line(x=list(range(len(f_values_3))), y=log_diff_3, name= "Exact Linesearch"))
fig.add_trace(go.Line(x=list(range(len(f_values_4))), y=log_diff_4, name= "Heavy Ball Method"))
fig.add_trace(go.Line(x=list(range(len(f_values_5))), y=log_diff_5, name= "Nesterov's Method"))
fig.show()
2.647269010597991e-10
c:\Users\Gianni\anaconda3\lib\site-packages\plotly\graph_objs\_deprecations.py:378: DeprecationWarning: plotly.graph_objs.Line is deprecated. Please replace it with one of the following more specific types - plotly.graph_objs.scatter.Line - plotly.graph_objs.layout.shape.Line - etc.
From the iteration counts, we can see our convergence takes much longer now than before since $f$ is weakly convex. Along with this, we can also see from the convergence plots that all steepdescent methods perform similarly in convergence to the minimum, since now having a step size based on $m$ is arbitrary in this function.